Speed up crsql_changes merges ~8x by sinkingsugar · Pull Request #7 · shards-lang/cr-sqlite

sinkingsugar · 2026-06-13T01:19:54Z

Why

Importing CRDT diffs (INSERT INTO crsql_changes) was slow enough that apps resorted to hand-rolled batching hacks. Profiling showed that per merged change the extension paid a full sqlite3_prepare/finalize cycle, ~7 redundant statement executions, and — the dominant cost — three hot statements used RETURNING, which makes SQLite materialize an ephemeral btree with its own pager + page cache on every executed row (visible as massive sys-time / VM churn). The clock-table RETURNING key literally echoed back a value bound as parameter 1.

What

Drop RETURNING from all hot-path statements — winner clock (use the bound key), pk-lookaside inserts (__crsql_key is a rowid alias → sqlite3_last_insert_rowid), site-id ordinal insert, and the as_crr backfill path
No more per-row temp prepare — crsql_get_or_create_key_packed() binds unpacked PK ColumnValues directly into the cached key statements
Sync bit via direct int* (ExtData.syncBitPtr) instead of stepping SELECT crsql_internal_sync_bit(x) twice per change
Last-row memo (table, pk) → (lookaside key, causal length): changesets arrive ordered by (db_version, seq), so the N column changes of one row hit it back-to-back. Invalidated on commit/rollback hooks, savepoint rollback (new vtab xRollback/xRollbackTo, module iVersion 2), local-write trigger entry, table-info reloads, compact_post_alter, and any merge error (statement-journal undo)
Site-id → ordinal memo (changesets are virtually always single-site)
crsql_next_db_version computed in C and bound as a value; skips the PRAGMA data_version probe when pendingDbVersion is already established in the open transaction

Results (Release, Apple Silicon, single transaction)

Scenario	Before	After
Fresh import, 80k changes / 20k rows	0.943s	0.111s	8.5×
Idempotent re-import (all changes lose)	0.179s	0.063s	2.8×
400k changes / 100k rows	~4.7s	0.64s (~625k changes/s)	~7×

Remaining profile is genuine btree work (column upsert + clock insert).

Testing

C unit suite passes; adds testRowMemoSavepointRollback covering memo invalidation across ROLLBACK TO (written with NDEBUG-immune checks, since assert is compiled out in Release)
All 155 Python correctness tests pass
Clean under Guard Malloc (make asan currently hangs in the ASAN runtime's own init on recent Xcode — pre-existing, unrelated)
Two-peer convergence verified end-to-end: concurrent conflicting inserts/updates/deletes across two tables, full + delta (db_version > ?) exchanges → byte-identical peers
core/test/perf/bench-import.sh added for repeatable measurement

Per merged change, the insert path paid a full prepare/finalize cycle, several redundant statement executions and - worst of all - three hot statements used RETURNING, which makes SQLite materialize an ephemeral btree (with its own pager and page cache) on every executed row. - Drop RETURNING from winner clock, pk lookaside, site_id ordinal and backfill inserts; read values from bound params or last_insert_rowid - Bind unpacked pks directly to the key lookaside statements instead of round-tripping through a temporary 'SELECT ?,?,...' prepare per row - Toggle the sync bit through a direct pointer instead of executing SELECT crsql_internal_sync_bit(x) statements per change - Memoize (table, pk) -> (lookaside key, causal length) of the last merged row; changesets are ordered by (db_version, seq) so the N column changes of a row hit the memo back-to-back. Invalidated on commit/rollback, savepoint rollback (new vtab xRollback/xRollbackTo, module iVersion 2), local CRR writes, table info reloads, compaction and on any merge error (statement journal undo) - Memoize the last site_id -> ordinal resolution (changesets are virtually always single-site) - Compute next_db_version in C, bound as a value, and skip the PRAGMA data_version probe when pendingDbVersion is already set for the open transaction Import of 80k changes (20k rows): 0.94s -> 0.11s. Idempotent re-import: 0.18s -> 0.06s. Adds a savepoint-rollback regression test (with NDEBUG-immune asserts) and test/perf/bench-import.sh.

A site's col_version is monotonic per cell, so an incoming change whose col_version equals the local clock entry AND whose site_id matches the entry's author is the identical change: reject it without reading the local value from the data table. The site ordinal rides along in the col_version select (same clock row, no extra cost) and the incoming site's ordinal resolves through the existing site_id memo. True concurrent edits (equal versions from different sites) still fall through to the deterministic value comparison, verified by an explicit two-peer convergence check and the python correctness suite. Also folds the duplicated ordinal lookup in set_winner_clock into the shared lookup_site_ordinal helper. Idempotent re-import of 80k changes: 0.063s -> 0.035s (0.179s before this PR).

sinkingsugar added 2 commits June 13, 2026 09:19

sinkingsugar merged commit d0540ac into pure-c-port Jun 13, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up crsql_changes merges ~8x#7

Speed up crsql_changes merges ~8x#7
sinkingsugar merged 2 commits into
pure-c-portfrom
perf/fast-changes-merge

sinkingsugar commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sinkingsugar commented Jun 13, 2026

Why

What

Results (Release, Apple Silicon, single transaction)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant